Relative phase information for detecting human speech and spoofed speech
نویسندگان
چکیده
The detection of human and spoofed (synthetic/converted) speech has started to receive more attention. In this study, relative phase information extracted from a Fourier spectrum is used to detect human and spoofed speech. Because original/natural phase information is almost entirely lost in spoofed speech using current synthesis/conversion techniques, a modified group delay based feature, the frequency derivative of the phase spectrum, has been shown effective for detecting human speech and spoofed speech. The modified group delay based phase contains both the magnitude spectrum and phase information. Therefore, the relative phase information, which contains only phase information, is expected to achieve a better spoofing detection performance. In this study, the relative phase information is also combined with the Mel-Frequency Cepstral Coefficient (MFCC) and modified group delay. The proposed method was evaluated using the “ASVspoof 2015: Automatic Speaker Verification Spoofing and Countermeasures Challenge” dataset. The results show that the proposed relative phase information significantly outperforms the MFCC and modified group delay. The equal error rate (EER) was reduced from 1.74% of MFCC, 0.83% of modified group delay to 0.013% of relative phase. By combining the relative phase with MFCC and modified group delay, the EER was reduced to 0.002%.
منابع مشابه
Anti-spoofing system: an investigation of measures to detect synthetic and human speech
Automatic Speaker Verification (ASV) systems are prone to spoofing attacks of various kinds. In this study, we explore the effects of different features and spoofing algorithms on a state-of-the-art i-vector speaker verification system. Our study is based on the standard dataset and evaluation protocols released as part of the ASVspoof 2015 challenge. We compare how different features perform w...
متن کاملInvestigation of Sub-Band Discriminative Information Between Spoofed and Genuine Speech
A speaker verification system should include effective precautions against malicious spoofing attacks, and although some initial countermeasures have been recently proposed, this remains a challenging research problem. This paper investigates discrimination between spoofed and genuine speech, as a function of frequency bands, across the speech bandwidth. Findings from our investigation inform s...
متن کاملThe Effect of Comprehensible Input and Comprehensible Output on the Accuracy and Complexity of Iranian EFL Learners’ Oral Speech
This study aimed at investigating the relative impact of comprehensible input and comprehensible output on the development of grammatical accuracy and syntactic complexity of Iranian EFL learners’ oral production. Participants were 60 female EFL learners selected from a whole population pool of 80 based on the standard test of IELTS. To investigate the research questions, the participants were ...
متن کاملCombining evidences from mel cepstral, cochlear filter cepstral and instantaneous frequency features for detection of natural vs. spoofed speech
Speech synthesis and voice conversion techniques can pose threats to current speaker verification (SV) systems. For this purpose, it is essential to develop front end systems that are able to distinguish human speech vs. spoofed speech (synthesized or voice converted). In this paper, for the ASVspoof 2015 challenge, we propose a detector based on combination of cochlear filter cepstral coeffici...
متن کاملStudy on Unit-Selection and Statistical Parametric Speech Synthesis Techniques
One of the interesting topics on multimedia domain is concerned with empowering computer in order to speech production. Speech synthesis is granting human abilities to the computer for speech production. Data-based approach and process-based approach are the two main approaches on speech synthesis. Each approach has its varied challenges. Unit-selection speech synthesis and statistical parametr...
متن کامل